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A FAST EXACT SIMULATION METHOD FOR A CLASS OF 
MARKOV JUMP PROCESSES 

YAO LI AND LILI HU 


Abstract. A new method of the stochastic simulation algorithm (SSA), named 
the Hashing-Leaping method (HLM), for exact simulations of a class of Markov 
jump processes, is presented in this paper. The HLM has a conditional constant 
computational cost per event, which is independent of the number of exponential 
clocks in the Markov process. The main idea of the HLM is to repeatedly imple¬ 
ment a hash-table-like bucket sort algorithm for all times of occurrence covered 
by a time step with length r. This paper serves as an introduction to this new 
SSA method. We introduce the method, demonstrate its implementation, analyze 
its properties, and compare its performance with three other commonly used SSA 
methods in four examples. Our performance tests and CPU operation statistics 
show certain advantage of the HLM for large scale problems. 


1. Introduction 


Since the late 1960s, much effort has been devoted to the simulation of Markov 
jump processes on high dimensional state spaces. Most of these Markov jump pro¬ 
cesses arise in two classes of problems. The first is usually called chemical re¬ 
action networks, which model a fixed number of chemical reactions under dilute, 
well-mixed conditions. It is well accepted that when the number of molecules is 
small, due to stochastic effects, a deterministic differential equation fails to model 
real-world chemical reactions accurately. Therefore, numerous chemical reaction 
systems within biological cells, such as gene networks, regulatory networks, and sig¬ 
naling pathway networks, are modeled by Markov jump processes. The second type 
of problems are related to the kinetic Monte Carlo (KMC) method 30 , which essen¬ 
tially covers all stochastic evolution models that proceed as a sequence of infrequent 
transitions at heterogeneous, state-dependent exponential random times. The KMC 
was first introduced to simulate radiation damage [^. Today, it is used to gener¬ 
ate stochastic trajectories appearing in surface/crystal growth, chemical/physical 
vapor deposition, vacancy diffusion, communication networks, and factory schedul¬ 
ing 


Markov jump processes coming from both the KMC and chemical reaction net¬ 
works have some common features. They are all driven by finitely many independent 
exponential clocks. The state of the Markov process is updated when a clock rings, 
called an “event”. The rate of each clock depends on the current state of the pro¬ 
cess. Therefore, the update that follows an event may change the rate of other 
exponential clocks. In most applications, the update is a simple transformation. 
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under which the rates of most clocks remain unchanged. A variety of Markov jump 
processes in statistical physics, such as the kinetic Ising model, the simple inclusion 
process (SIP), the simple exclusion process (SEP), and many of their variants like 
the TASEP, ASEP, etc. , can also be categorized into this family. 

The scale of these Markov jump processes can be very large. For example, some 
chemical reaction networks have thousands of reactions, while the scale of some 
reaction-diffusion systems can be as large as several million. Therefore, it is impor¬ 
tant to design fast algorithms for those large scale problems. As a Markov jump 
process proceeds sequentially at a series of state-dependent random times, the fun¬ 
damental rule of an exact simulation is always to identify the time and the index 
of the next occurring event, to update the state accordingly, and then to advance 
the time. Algorithms following this strategy are called Stochastic Simulation Algo¬ 
rithms (SSA). Early methods of the SSA like Gillespie’s direct method (DM) [I^ , 
the first reaction method (FRM) [^, and the BKL algorithm for the KMC rely 
on a linear search of times of occurrence of events. More sophisticated methods, like 
the next reaction method (NRM) and the composition-rejection method (CRM), use 
a binary heap or other mechanisms to sample the next occurring event [l6|[2^ . The 
performance of a stochastic simulation algorithm is usually measured by computa¬ 
tional cost per event. Let M be the number of exponential clocks in the Markov 
jump process. Then the computational cost per event is 0{M) for early methods 
like the DM, the FRM, and the BKL, O(logM) for more recent methods like the 
NRM, and conditional 0(1) for the CRM. Besides various methods of SSA, there 
are also approximate algorithms such as the tan-leaping algorithm and its numerous 
variants 


A large family of enhanced methods of the SSA are also developed for more spe¬ 
cific problems in different applications [^[8,10,26,27,31,32 . Important methods 


that are worth to mention include: the multi-scale stochastic simulation algorithm 
(MSSA) for chemical reaction networks with multiple time scales [6,31 , the opti¬ 


mized direct method (ODM) for exact simulations of chemical reaction networks 
with very heterogeneous reaction rates [s], and the next subvolume method (NSM) 


for stochastic reaction-diffusion systems 14 


The aim of the present paper is to introduce the Hashing-Leaping method (HLM), 
which is a novel method of the SSA with conditional 0(1) computational complexity 
per event. Motivated by the bucket sort algorithm, we repeatedly leap forward the 
time by a constant r, then use a hash-table-like algorithm to distribute random times 
covered by the leaping step into Q buckets. Each bucket corresponds to a period of 
time with length t/Q. Under some general assumptions about the Markov process, 
the average number of events in each bucket is 0(1) for suitable Q and r. Then 
we sequentially update all events in each bucket until the next leaping step. It is 
not difficult to check that the average computational cost per event is 0(1) when 
r ~ 0(1) and Q ~ 0{M). This is further confirmed by our numerical simulations. 

The performance of the HLM is tested in four numerical examples and compared 
with that of the DM, the NRM, and the CRM. The number of clocks M in the 
first three examples ranges from tens to millions. The last example is a chemical 
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oscillator with five reactions called the “Oregonator”. Numerical simulation results 
show significant advantage of the HLM over the other tested SSA methods when M 
is large. For small scale problems, the HLM remains competitive and is significantly 
faster than the CRM, which is the only other existing conditional 0(1) method to 
the best of our knowledge. 

As a novel method of the SSA, there are many extensions that are beyond the 
scope of the present paper. Promising extensions include methods to adjust param¬ 
eters during the simulation, the parallelization, extensions to multi-scale problems, 
and various applications of the HLM. These topics will be included in our subsequent 
works. 

The organization of this paper is as follows. In Section 2, we will describe the 
family of Markov jump processes that we are interested in, which covers Markov 
processes derived from chemical reaction networks and the KMC. Section 3 presents 
a short review of mainstream SSA methods. The HLM is introduced and analyzed 
in Section 4. Section 5 focuses on performance tests of the HLM on various models. 
Section 6 is the conclusion. 

2. Description oe Markov jump process model 

We first give a generic description of the Markov jump process to be studied 
in the present paper. Let be a Markov jump process on that 

is determined by M random times, which are generated by mutually indepen¬ 
dent exponential clocks. The rates of those clocks are state-dependent, denoted 
by Ri{Xt), • • •, Rm^Xi), respectively, where Ri : —)■ are called rate functions. 

Throughout this paper, all rate functions are assumed to be time independent. An 
update transformation TA : —)■ is associated with each exponential clock, 

where Ui G is a random parameter whose probability measure is pi{dui), and Dj 
is the sample space. When the i-th clock rings, called an “event”, Xt is updated 
by the random transformation During the update, a random parameter Ui is 
sampled from the probability measure Pi, independent of everything else. After the 
update, Xt jumps to a new state Xt+ = Tf^[Xt). Throughout the present paper, 
unless specified otherwise, M means the number of exponential clocks, which is said 
to be the scale of the Markov jump process. 

The dependency graph of Xt is a directed graph G = (K, E) with M vertices 
representing M exponential clocks, {f, j} forms a directed edge if and only if Rj ^ 
Rj o Tj. In other words, {i,j) is an edge if and only if the Ath clock affects the j-th 
clock. 

It is easy to check that Markov jump processes arising in a very large family 
of applications, including statistical mechanics, chemical reaction networks and the 
KMC, fit the description of Xt- In those applications, N and M can be very large 
numbers, but Ri only depends on a limited number of coordinates of Xt, and R is 
the identity transformation on all but a limited number of coordinates. Therefore, 
the dependency graph is usually sparse, which means the maximum out degree of G 
is independent of M. For example, for a stochastic chemical reaction network with 
N chemical species and M reactions, we have Xt = where x\ represents the 
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number of molecules of the i-th. reactant species. The rates of reactions, denoted 
by {i?i, • ■ ■, Rm}-! are determined by the population of reactant species. The j-th 
transformation is Tj{Xt) = Xt + Vj, where the vector Vj G is a sparse vector 
with all zero entries except those corresponding to reactant species that get changed 
in the j-th reaction. 


3. A SHORT REVIEW OF EXISTING SSA METHODS 


3.1. Direct method and first reaction method. As explained in the introduc¬ 
tion, when simulating At, it is important to note that those M clocks are mutually 
independent on a time interval only if all rates Ri remain unchanged. When one 
clock rings, the corresponding update transformation Tj will change the rates of 
other clocks. With a small but positive probability, this will lead to a “chain reac¬ 
tion” of events and change the rates of all clocks in a short time frame. Therefore, 
to simulate A^, the strategy of the SSA is to always identify the next event. 

The hrst two popular methods of the SSA for Markov jump processes like A* were 
introduced by Gillespie 17,18 , called the direct method (DM) and the hrst reaction 
method (FRM), respectively. In the simulation of the kinetic Ising model, the DM 
is also called the BKL algorithm, which was developed independently [^. In the 
DM, two random variables are generated to sample each event. The hrst random 
variable determines the time of occurrence of the next event, and the second one 
is used to sample the index of the next event together with a linear search. In the 
FRM, a linear search is used to sample both the time of occurrence and the index of 
the next event at the same time. Due to the linear search, the computational costs 
of both methods are 0{M) per event. These methods can be summarized as follow. 

Direct Method 


1: Initialize RiS. Initialize Aq. Let Rsum = Ri t = 0. 

2: Generate an exponential random variable with rate Rsum, denoted by At 
3: Let t = t + At. Generate a uniform random variable on [0,1], denoted by u 
4: Find the minimum I such that uRgum < Ri 
5: Update the state according to R: Xt+ = Ti{Xt) 

6: Recalculate all rate functions and Rsum 
7: Return to 2 or quit 
First Reaction Method 


1: Initialize RiS. Initialize Aq. Let t = 0 

2: Generate M exponential random variables ti, - ■ ■ Rm with rates Ri, • • ■, Rm, 
respectively. 

3: Use linear search to find the minimum, denoted by L. Let t = t + ti 
4: Update the state according to R: Xt+ = Ti{Xt) 

5: Recalculate all rate functions 
6: Return to 2 or quit 

It is a standard exercise to show that these two methods are equivalent. The DM 
has many variants, such as the optimized direct method (ODM) introduced by Cao 

et al [^. 
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3.2. Next reaction method. The FRM was significantly optimized by Gibson 


and Brnck 16 , called the next reaction method (NRM). Main improvements of the 


NRM inclnde 


• Introdncing the concept of the dependency graph. Only rate fnnctions af¬ 
fected by an event will be npdated 

• Reusing times of occurrence of events without regenerating random variables 

• Using a minimum binary heap to reduce the search time to 0(1) and the 
update time to O(logM) 

The NRM reduces the average computational cost per event to O(logM). Cur¬ 
rently, this method is widely used in stochastic simulation packages and commercial 
software. We will also compare the performance of our new method with that of the 
NRM. 

The NRM can be summarized as follows 


1: Initialization: 

(a) Initialize RiS. Initialize Xq. 

(b) Construct the dependency graph G 

(c) Generate M exponential random variables ti, - ■ ■ Rm with rates Ri, ■ ■ ■, Rm, 
respectively 

(d) Store times of occurrence into a minimum binary heap 

2: Find the event on the top of the minimum binary heap, denoted by ti 
3: Update the state according to R: Xt+ = Ti^Xf) 

4: Follow the dependency graph to update all affected rate function RiS 
5: Update times of occurrence of all affected events as 


(3.1) 


old 


R' 

^new ^ r^old y ^ ^ 

t \ z w TDnew ^ ’ 


6 : 


7: 


p^new 

and maintain the binary heap 

Advance ti by an exponential random number with rate Ri and maintain the 
binary heap 
Return to 2 or guit. 

Equation (3.1) is also adopted by the CRM 28 and the HLM introduced in this 
paper. As will be explained in Section 4.2, equation (3.1) comes from the property 
of the exponential distribution. We remark that this transformation formula only 
applies to time independent rate functions. Equation (3.1) will be different if some 
RiS are time varying, see [16] for the detail. 


3.3. Composition-Rejection method. The idea of the composition-rejection method 
(CRM) [^ comes from a simple probabilistic fact implicit in Gillespie’s direct 
method. Suppose we have M mutually independent exponential random variables 
hi, • • •, Ym with rates i?i, • • •, Rm, respectively. Let Emm be the minimum of the M 
random variables and Rsum be the sum of the M rates. Then TAm is an exponential 
random variable with rate Rsum- In addition, 

nYrmn = Yi\ = . 
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Therefore, sampling Yi = Ymin is equivalent to sampling a weighted distribution 
over {!,■■■ ,M} with weights Ri/Rgum, ■ ■ ■, Rm/R sumi respectively. Instead of the 
linear search used in the DM or the FRM, this sampling can also be done by the 
following rejection-based method. Let Rmax = Taax{Ri, ■ ■ ■, Rm}- Two random 
numbers Zi and Z 2 are repeatedly generated until Y^in is selected, where Zi is 
uniformly distributed on [0, Rmax], and Z 2 is an integer that is uniformly distributed 
over {1, • • •, M}. The rule of the rejection is that: if Zi > Rz^, then the pair {Zi, Z 2 ) 
is rejected. Otherwise it is accepted and we have = ^Z 2 - 

The rejection-based sampling method has constant computational complexity in¬ 
dependent of M. However, the constant can be very large if Rmax is much larger 
than all the other i?jS. This is partially solved by the CRM. The CRM relies on 
the maximal and minimal rates Rmin and Rmax- To reduce the expected number 
of rejections, Ri,---,Rm are distributed into g > [log 2 (Rmax/Rmin)1 groups. The 
hrst group contains rates ranging from Rmin to 2Rmin, the second group from 2Rmin 
to ARmin, and so on. According to 28 , (/ < 30 is sufficient for most applications. 
Sums of rates in each groups are calculated, denoted by pi, ■ ■ ■ ,Pg, and stored. The 
CRM uses Gillespie’s direct method to sample the group (composition), and uses 
the rejection-based sampling technique introduced above to select the event (rejec¬ 
tion) within the sampled group. After updating an event, Rsum, Rmax, Rmin and all 
affected groups will be maintained. 

The CRM reduces the computational cost per event to conditional 0(1) when 
M is sufficiently large. See the performance test in Section 5.2. The requirement 
of the 0(1) complexity is (i) the dependency graph G is sparse and (ii) the ratio 

Rmax /R 

min is 0(1). 

The CRM can be summarized as follows: 


1 : 


2 

3 

4 

5 

6 

7 

8 
9 

10 


Initialization: 

(a) Initialize RiS. Initialize Xq. Let f = 0 

(b) Construct the dependency graph G 

(c) Distribute M clock rates into g groups and compute pi, - ■ ■ ,Pg, the sum 
of rates in each groups 

(d) Calculate Rsum = Yh=i Ri 

Cenerate an exponential random variable with rate Rsum, denoted by At 
Let t = t + At. Cenerate a uniform random variable on [0,1], denoted by u 


Find the minimum k such that uRsum < Pi 

Use the rejection-based sampling method to select an event Ri from group pk 
Update the state according to Ti: Xt+ = Ti{Xt) 

Follow the dependency graph to update all affected rate function RiS 
Update times of occurrence of all affected events as (3.1) 

Maintain the groups, update R, 

Return to 2 or guit. 


‘-sum , Rmax, O^nd Rmin 


and affected PiS 


4. Hashing-Leaping method (HLM) 

4.1. Introduction to the method. In this section, we introduce a conditional 
0(1) per-event SSA method for the exact simulation of Xt- As reviewed in the 
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previous section, the main bottleneck of simulating Xt is sampling the “next event”. 
Instead of a linear search or a heap sort, the HLM is motivated by the bucket sort 
algorithm, which is a linear complexity sorting algorithm in most practical settings. 
To simulate Xt, two parameters r and Q are chosen by either observing rate function 
RiS or performing a smaller scale simulation, where r > 0 is the step size and the 
positive integer Q is the number of buckets. 

The HLM runs in the following way: Same as in the NRM, times of occurrence of 
events associated with M clocks, denoted by ti, • • •, tM, are stored and maintained. 
The algorithm makes a major update in the beginning of every time step with 
length r, called a bucket redistribution. In the n-th bucket redistribution, ti, - ■ ■ ,tM 
are distributed into Q + I buckets, denoted by Bi, • • •, Bq, Bi, that represent time 
intervals 

Bi = [{n-l)T , {n-l)T+ t/Q) 

B 2 = [{n-l)T+ t/Q , {n-l)T+ 2 t/Q) 

Bq = [(n-l)r+(Q - l)r/Q, nr) 

Bl = [nr , + 00 ), 

respectively. Then we start from the hrst non-empty bucket B^ to hnd the 
minimum time of occurrence, say f;, by a linear search, and make update according 
to Ti. During the update, the following two operations will be carried out: (i) The 
dependency graph is followed to update the new clock rates of all affected clocks, as 
well as the corresponding times of occurrence, (ii) An exponential random variable 
with rate Ri will then be generated and added to ti, which is the time of occurrence 
of the next event associated with Ri. ti will then be placed into the proper bucket. 
We repeat this step until Bn^ is emptied. 

Then we move to the next nonempty bucket and carry out the same series of 
operations. This procedure continues until all buckets Bi, - ■ ■, Bq are emptied. At 
that time, times of occurrence of all events are stored in Bl, which will be used 
to perform the next bucket redistribution. We call it the Hashing-Leaping method 
because the bucket redistribution step resembles the Hash algorithm, while the whole 
algorithm can be seen as an exact version of the tau-leaping algorithm. 

The HLM should be implemented with the proper data structure to improve the 
efficiency. The simplest way we hnd is to construct an array of N structs, named the 
TimeArray. Each struct, of the type ST, has three elements: a hoating point num¬ 
ber that indicates the time of occurrence of the event associated with the exponential 
clock, and two ST pointers pointing to its left and right neighbors, respectively. In 
addition we need an array of Q -|- 1 ST pointers, called the BucketArray, that rep¬ 
resents the heads of the Q + 1 buckets. A hoating point number array RateArray 
is also needed to store the rate of each clock. 

After a bucket redistribution, each bucket is formed by a doubly linked list whose 
head is pointed to by an element in the BucketArray, as shown in Figure When 
updating each bucket, a linear search is performed to hnd the minimum time of 
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occurrence within this bucket. Every update after an event requires two operations: 
(i) remove the corresponding ST struct from its old bucket, i.e., set its left and right 
pointers to NULL and maintain the doubly linked list; and (ii) push the struct into 
the front of the new bucket, i.e., relink its left and right pointers. To increase the 
efficiency, if an ST struct remains in the same bucket after an update, only its time 
of occurrence will be changed. It is a simple practice to implement the HLM in 
C/C++. 

We remark that according to our test, it seems to be less efficient to implement 
buckets as linear arrays than to implement them as linked lists. Although linear 
arrays are more cache friendly than linked lists, we have to frequently move structs 
from one bucket to the other instead of just relinking pointers. In addition, to main¬ 
tain the linear array data structure, when removing one element from the bucket, 
the last element has to be moved to £11 the empty slot. As a result, we observed 
~ 10% decrease of the performance when implementing buckets as linear arrays. 


BucketArray 


B 


0 




Bl 


TimeArray 




0.1 


0.6 


TT 


0.2 


0.9 


JT 


1.1 


■»NULL 


0.4 


NULL 


■>NULL 


Figure 1. TimeArray and BucketArray for M = 6 , Q = 2 and r = 1. 

The TimeArray stores times of occurrence of all 6 clocks, labeled U to 
tQ. Three buckets stored in the BucketArray, labeled Bq, Bi, and Bl, 
contain times of occurrence between 0 and 0.5, between 0.5 and 1.0, 
and greater than 1.0, respectively. Arrows represent the direction of 
pointers 

The HLM can be summarized as follows 
1: Initialization: 

(a) Initialize RiS. Initialize Xq. Let t = 0. 

(b) Generate the dependency graph G. 

(c) Generate times of occurrence R, - ■ ■ ,tM- 

(d) Initialize TimeArray, RateArray, and BucketArray 
2: Ghoose proper parameters t and Q. 

3: Distribute times of occurrence to corresponding buckets BucketArray[0] ~ 
BucketArray [Q]. 

4: For i = 0 to Q — 1 

While BucketArray[i] is nonempty: 
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5: 


(a) Find the least time of occurrence ti within this bucket 

(b) Make update according to Ti 

(c) Follow the dependency graph G to update all affected rate functions in 

RateArray 

(d) Update time of occurrence of all affected events as (3.1) and move af¬ 
fected elements in TimeArray to the corresponding buckets. 

(e) Advance ti by an exponential random number with rate Ri and move 
TimeArray/// to the corresponding bucket 

Update the time intervals of each bucket and return to 3, or guit. 


4.2. Analysis of algorithm. It is important to demonstrate the correctness of the 
HLM before further investigations. In fact, the HLM is mathematically equivalent 
to the NRM. This can be checked by using the following three steps. 

1: Sampling events Sampling the next time of occurrence is the core step of 
the SSA. In the HLM, the next time of occurrence is chosen as the minimum 
of M random times, which is equivalent to the NRM and the FRM. 

2: Reusing random times. After initialization, a sample of the time of occur¬ 
rence of each clock is taken from the corresponding exponential distribution. 
If an event occurs at time t without changing the rate of this clock, this 
sample can be reused due to the time-invariant nature of the exponential 
distribution. More precisely, if Z is an exponentially distributed random 

Z\z>t, has the same ex- 


for the full mathematical 


variable, then for any t > 0, the “overshoot”, i.e. 
ponential distribution. We refer Theorem 1 in (Ib) 
detail. 

Changing clock rates. The update transformation in an event may change 
the rate of other clocks. Therefore the affected times of occurrence need to 
be updated accordingly. 

Assume after an event at time t, the rate of the i-th clock changes from 
to By step 2, the “overshoot” ti—t has an exponential distribution with 

rate Rf'^. In addition, it is a simple probabilistic fact that for any constant 
a > 0 and any exponential random variable Z with rate r, aZ is exponentially 
distributed with rate r/a. Hence the transformation in equation (3.1) maps 
an exponential distribution with rate Rf'^ to an exponential distribution with 
rate For further reference regarding the proof, see Theorem 2 in 16 


4.3. Analysis of complexity. 

(1) Average computational cost. 

Assume 

(a) There exists a constant K independent of M such that 

E [Number of events occurring on [t, t -|- A]] < KM A 
for any t and any A > 0; and 

(b) The dependence graph is sparse such that the maximum out degree is 

0 ( 1 ). 
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Then if r ~ 0(1), 0{M) events occur in each time step with length r. The 
computational cost of a bucket redistribution is M. If we choose Q ~ 0{M), 
then the number of events in each bucket can be (very) roughly approximated 
by a Poisson distribution with 0(1) mean. Since the dependency graph is 
sparse, the average cost of updating each bucket is 0(1). Therefore the total 
computational cost in one step is 0(M) + Q x 0(1) = 0{M). This makes 
the average complexity of the HLM be 0(1) per event. 

If the dependency graph is not sparse in a way that the average out degree 
of vertices is D, which is possibly dependent of M, then the average cost of 
updating each bucket is 0(0). This brings the total computational cost in 
one step to 0{DM) and the average complexity per event to 0(0). 

We remark that assumptions (a) and (b) are satished by a large class of 
models in practice. In particular, if Rmax ~ 0(1) and Rmax/Rmin ~ 0(1), 
then (a) is satished. 

(2) Worst case analysis. In the worst case, which means all events are 
distributed into the same bucket, the HLM is equivalent to the FRM. There¬ 
fore, the computational cost per event in the worst case is 0(M). However, 
we remark that in most practical cases, this worst situation occurs with an 
extremely low probability. If Rmax ~ 0(1) and Rmax/Rmin ~ 0(1), when 
letting Q ~ 0{M) and r ~ 0(1), the probability that all M events are 
placed into a single bucket is ~ 


4.4. Discussion of issues. 

(1) Choice of parameters. 

Finding optimal parameters for the HLM is a challenging job. We make 
some idealized calculations to shed some light on this problem. Assume (a) 
and (b) in Section 4.3 hold. Then the average computational cost per event, 
denoted by C, on this time step satishes 


C = 


search cost + update cost 


4/events 

+bucket iteration cost + bucket redistribution cost 


aMrX ^ 2 

-\-CiQ -\- CrM 


,aMT.^ aMr 
+ 2 


Q 


Q 


+ CuO:Mt 


where Cg, Cu, Ci, and O^ represent the average cost of searching, updating 
events, bucket iteration, and bucket redistribution. Therefore, it is easy to 
check that for each hxed r, the optimal Q satishes 



Qopt = aMr ■ 
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When Q is optimal, we have 

c = ./2CsQ + a + a + —. 

ar 

It is reasonable to assume Cg, Ci, and Cr are constants. But Cu depends on 
r because we do not relink pointers if an update does not move the affected 
event to a new bucket. For smaller r, larger proportion of events will remain 
in the bucket Bl throughout the step. Hence a more precise model of Cu is 

Cu = Cu{T)=C: + C:(l-e-n. 

where C'^ and C" are constants that represent the cost of update without 
relinking pointers and the cost of relinking pointers. 

Therefore, we conclude that the optimal number of buckets Q is in pro¬ 
portion to both r and M, and the optimal value of r depends on constants 
C", Cr, and a. 

One important remark is that the empirical performance of the HLM is 
not sensitive with respect to small change of parameters. The reason is that 
updating an event, which includes calculating new rate functions, modifying 
times of occurrence, and relinking pointers, is much more expensive than 
the other operations. Hence Cu is signihcantly greater than the other three 
constants. Similarly, C'^ is greater than O". According to our test, the 
performance of the HLM is stable as long as Q ~ 0{M) and r ~ 0(1). For 
example, for the generalized KMP model introduced in Section 5.1, the CPU 
time with t = 1,Q = M and that with t = 1,Q = M/10 have only less than 
10% difference. 

When simulating models in Section 5, we choose smaller r such that about 
half of events are placed into Bl, then make Q be in proportion to r and M. 
We admit that these parameters may not be optimal. To obtain the optimal 
parameters, all constants introduced above should be estimated empirically. 
In other words, the algorithm needs to be enhanced such that r and Q can be 
properly adjusted as the simulation proceeds. We will make this extension 
in our subsequent works. 

(2) Possible improvements. 

There are two places to further improve the HLM. The first potential im¬ 
provement is at the level of implementation. One can replace the linear 
search in each bucket with a binary search. Under our assumption, the ex¬ 
pectation of the largest number of events stored in a bucket is O(logM). 
Therefore, a binary search could potentially increase the performance of the 
algorithm. As mentioned before, it is slightly less efficient to implement 
buckets as linear arrays than to implement them as linked lists. However, we 
expect some improvement if buckets are implemented as linear arrays, with 
binary heaps constructed on them. ( It is standard knowledge that imple¬ 
menting a binary heap as a linked list is signihcantly more complicated than 
implementing that as a linear array. ) We did not present this improvement 
in the present paper because it does not change the 0(1) computational 
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complexity per event of the HLM. Plus due to the overhead of constructing 
the binary heap, we expect the improvement can only be observed for very 
large scale problems. 

The second improvement is for Markov jump processes with sparse depen¬ 
dency graphs. Assuming Rmax/Rmin ~ 0(1)) when the dependency graph is 
sparse, times of occurrence stored in each bucket will not affect each other 
with high probability (~ 1 — 0(1/M)). Therefore, in most situations we 
can update times of occurrence in each bucket sequentially from the head 
of the list without searching for the minimum. Since in average a bucket 
only contains 0(1) events, this improvement does not change the compu¬ 
tational complexity of the algorithm either. In fact, we only observed a 
minor increase of simulation speed at the cost of much more complicated 
programming code. However, this idea can be used in the parallelization of 
our algorithm, as explained in (3). 

(3) Parallelization. 

Although the fact that events in one bucket are “almost independent” 
does not signihcantly improve the performance of the HLM when running 
on a single CPU, it can be used to parallelize this algorithm. In fact, the 
HLM is very compatible with parallel programming if the dependency graph 
is sparse. The idea is to make the number of buckets Q ~ 0{M/p), where 
p is the number of CPUs, and divide one bucket into p sub-buckets. During 
the bucket redistribution, we can evenly distribute events into sub-buckets 
that are maintained by different CPUs. Suppose the dependency graph is 
sparse, then events placed in each bucket will not affect each other with 
high probability. Therefore in most of the running time, different CPUs can 
update events in their own sub-buckets independently. 

The principle of the parallelization is straightforward. However, there are 
lots of details remaining to be studied. With a low but strictly positive 
probability, an event in one bucket can affect events in the same bucket 
(but probably different sub-bucket), which will cause intensive communica¬ 
tion between CPUs. Therefore, it is crucial to design algorithms that can 
identify and deal with these small probability events with the lowest overall 
overhead. We will study the parallel HLM and its applications intensively in 
our subsequent works. 


5. Numerical Examples 

5.1. Introduction to models. We chose the following four models to test the 
performance of the HLM. 

(i). Generalized KMP model. There are numerous stochastic processes in 
the held of statistical mechanics that ht the generic description of Xf in Section 2, 
such as the simple inclusion process (SIP), the simple exclusion process (SEP) and 
its variants [3, 11,12, and many stochastic processes that models the microscopic 
heat conduction |4, 13,24 . We chose the following generalized KMP model as a test 


case of our simulation. The KMP model is a stochastic model proposed in 21 that 


















A FAST EXACT SIMULATION METHOD EOR A CLASS OE MARKOV JUMP PROCESSES 13 


models the microscopic energy transport, from which a microscopic derivation of 
Fourier’s law was carried out. The main feature of the generalized KMP model ( 
See ^ 23 for details ) here is its energy dependent clock rates, which makes the 


simulation non-trivial. 

The KMP model models the energy transport in a ID chain of M oscillators cou¬ 
pled with two heat baths whose temperatures are 71 and Tr, respectively. We only 
take note of energy carried by each oscillator, denoted by Xi, - ■ ■ ,xm, respectively. 
The Markov chain generated by the KMP model can be described as follows. An 
exponential clock with rate R{xi,Xi^i),i = 0 ~ M is associated with each pair of 
adjacent oscillators (let Xq = 71 and xm+i = Tr). When the clock rings, the energy 
stored in the corresponding pair of oscillators is pooled together, repartitioned ran¬ 
domly, and redistributed back to the oscillators. The energy redistribution satisfies 

(x',x'+i) = {p{Xi+Xi+i), {1 - p){Xi + Xi+i)) , 

where p is a uniform random variable on (0,1) that is independent of everything 
else, and x' is the energy carried by oscillator i after the update. If clock 0 (resp. 
clock M) rings, the energy of the first (resp. last) oscillator exchanges energy with 
an exponential random variable with mean 71 (resp. Tr) in the same way. 

We let 71 = 1.0, Tr = 2.0, and 7?(xi,Xj+i) = ^/xiTTi^ and simulate the gener¬ 
alized KMP model with varying Ms. 

Markov jump processes arising in other statistical mechanics models may not 
satisfy the generic description of Xt in Section 2. However, they can still be simulated 
by the HLM efficiently. One example is the random halves model proposed in 13 


and its modihcation in 24 , in which the number of clocks varies randomly with time 
instead of being a constant. When the number of exponential clocks changes by one, 
the cost of maintaining the bucket data structure is 0(1). In contrast, in the NRM, 
the computational cost of maintaining the binary heap after adding/removing one 
exponential clock is O(logM). 

(ii). Chemical reaction network The second example we chose is a chemi¬ 
cal reaction network. Stochastic chemical reaction networks are a very important 
class of Markov jump process. It is well known that most SSA methods were origi¬ 
nally developed for the stochastic simulation of chemical reactions. To simplify the 
implementation, we use the same example in 


28 with minor modification. 


Consider a large chemical reaction network with M reactions. For the sake of sim¬ 
plicity we only take note of rate functions, or the propensity. The initial propensities 
are uniformly distributed over [0, 2]. The dependency graph is randomly generated 
in a way that each reaction affects m other reactions, where m is a random integer 
uniformly distributed over {l,---,30}. After the occurrence of each reaction, all 
affected reactions are updated in a way that the propensity is replaced by a new 
random number uniformly distributed on [0, 2]. We note that this example is dif¬ 
ferent from real-world chemical reaction networks, and is used only for the purpose 
of testing algorithms. 

The main difference between our example and the example presented in 
that we do not bound the ratio of Rmax to Rmin- In the example used by 


28 


IS 
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the ratio of Rmax to Rmin is bounded by 1 x 10® at the initial state and has minor 
changes when the simulation proceeds. 

(iii). Reaction-diffusion system The third example we chose is the Gray- 
Scott model, which is a reaction-diffusion system in 2D. Since the purpose of this 
simulation is to compare the performance of algorithms, we simplify some details. 
The Gray-Scott model is well known as its pattern formation phenomenon |^, as 
seen in Fig[^ The reaction part of the model involves two species U and V and four 
reactions 


U + 2V 
U 
V 
0 


Ki{n) 

— ^ 


31/ 



Kf ito(O)^ 


0 

0 

U 


where G is a parameter that indicates the scale of the system per subvolume. D 
is determined by both the molecular population level and the edge length of sub¬ 
volumes, which are assumed to be constants throughout this example. Parame¬ 
ters /ci(D) = kiQ~'^ and mo(G) = uoG depend on G. ki, Kj, K 2 ,uo are constants. 
Molecules are assumed to be well-mixed in each subvolume. Besides reactions, U 
and V molecules can jump to neighbor subvolumes at certain diffusion rates, de¬ 
noted by constants Du and Dy, respectively. Du and Dy also depend on the edge 
length of subvolumes. 

In the simulation, we study domains that are consist of K x K subvolumes for K 
ranging from 3 to 1200. Numbers of molecules of species U and V in each subvolume 
are denoted by Dij and Vjj, respectively, where i,j ranges from 1 to K. There are six 
exponential clocks associated with each subvolume with rates UijKf, 

VijKf, KfUo{fl), UijDu, and Vi^Dy, respectively. When the diffusion clock rings, 
one corresponding molecule in the subvolume moves to one of its nearest neighbors 
with equal probability. A molecule exits from the system if it moves out of the 
domain. Hence the scale of this system is M = QK^. The values of parameters 
are taken as G = 250, Kj = 0.0055, K 2 = 0.0205, Dy = 0.002, Du = 0.001, and 

ki = Uq = 1 . 


(iv). Chemical oscillator system. 

To examine the performance of the HLM for small scale problems, we consider 
the following chemical oscillator system called the “Oregonator”, which was origi¬ 
nally introduced in 15 . The “Oregonator” is a chemical reaction system with hve 
reactions 


XI + Y2 


n 

YI + Y2 


Zi 

X2 + H1 


2W + H3 

2 W 


Z2 

X3 + F3 


Y2, 





A FAST EXACT SIMULATION METHOD EOR A CLASS OE MARKOV JUMP PROCESSES 15 



Figure 2. Pattern formation of the Gray-Scott model. Fignre 2 (a): 
Contonr plots of nnmbers of U molecules on 100 x 100 subvolumes at 
t = 1500.0. Figure 2(b): Contour plots of numbers of U molecules on 
200 X 200 subvolumes at f = 1500.0 

where the molecular population level of species Xi,X 2 , and X 3 are assumed to 
be constants. The parameters are taken to be CiXi = 2, C 2 = 0.1, C 3 X 2 = 104, 
C 4 = 0.016, and C 2 X 3 = 26, which are consistent with 

5.2. Performance test of HLM. (i). Speed of simulation 

The hrst set of numerical simulations concern the speed of algorithm. We im¬ 
plement each of the four models introduced in the previous subsection using four 
different methods: the CRM, the DM, the HLM, and the NRM. All SSA methods 
are implemented in C, with C++ I/O for the sake of simplicity of coding. Imple¬ 
mentations of all algorithms are optimized to the best of our ability. For example, 
in the implementation of the CRM, instead of dynamic group bounds varying with 
R-mim pre-assigned group bounds are used to reduce the overhead of maintaining 
groups. All performance tests are run on a 2012 Macbook Pro laptop with an Intel 
Core I7-3615QM CPU and 8 GB memory. 

Instead of stopping after simulating a fixed number of events, we chose to simulate 
all examples up to f = 10, as simulating Markov processes up to different times may 
bias the result. The performance of four SSA methods is measured in seconds per 
million events and compared. The scales of the generalized KMP model, the chemical 
reaction network, and the reaction-diffusion system ranges from M = 10 ~ 10 ^, 
M = 50 ~ 5 X 10®, and M = 54 ~ 8.64 x 10®, respectively. 

The parameters of the HLM are chosen as r = 0.2, Q = M/10 for the generalized 
KMP model, r = 0.1, Q = M/20 for the chemical reaction network, r = 0.5, Q = 
M/2 for the react ion-diffusion system, and r = 0.01, Q = 5 for the “Oregonator”. 
The parameter of the CRM is chosen as 5 ^ = 30 for all models. 
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In examples 5.1 (i) — (iii) , CPU times ( seconds per million events) vs. M of all 
four SSA methods are plotted in linear-log plots and presented in Figure and 
In these hgures, each plot represents the mean CPU times of 10 runs. The error 
bars indicate one standard deviation of the mean. The CPU times ( seconds per 
million events) of four SSA methods over 10 runs of the “Oregonator” are presented 
in Figure]^ The error bars also represent one standard deviation. 

As shown in these hgures, for large scale models, the HLM is superior to the 
other SSA methods in all tested examples. Although having the same computational 
complexity, we hnd that the HLM is slightly faster than the CRM in all examples 
and all system sizes. One possible reason is that maintaining groups in the CRM 
is more expensive than maintaining buckets in the HLM. The CRM also generates 
more random numbers than the HLM. For small scale models, we hnd that the 
performance of the HLM is slower than the DM (and sometimes the NRM) but 
remains competitive. 



Figure 3. CPU times in seconds per 10® events vs. number of expo¬ 
nential clocks for the generalized KMP model. Blue, black, red, and 
green plots represent the mean CPU times of the CRM, the HLM, the 
NRM, and the DM over 10 runs, respectively. The error bars indicate 
one standard deviation of the mean. 

(ii). Statistics of computer operations 

The running time of an algorithm depends on many things beyond the efficiency 
of algorithms. Details of implementation, the compiler, the operating system, and 
the size of CPU cache can signihcantly change the empirical CPU times. Therefore, 
it is important to study the average number of operations per event of the HLM. 

We collect data of computer operations of examples 5.1 (i) — (iii) . The number 
of comparisons when linearly searching a bucket and the number of moves of events 
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Figure 4. CPU time in seconds per 10® events vs. number of expo¬ 
nential clocks for the chemical reaction network. Blue, black, red, and 
green plots represent the mean CPU times of the CRM, the HLM, the 
NRM, and the DM over 10 runs, respectively. The error bars indicate 
one standard deviation of the mean. 

between buckets are recorded. Our results conhrm that, in all three examples, the 
HLM has constant computational cost per event. The total numbers of comparisons 
plus moves per event are ~ 5.0 for the generalized KMP model, ~ 10.5 for the re¬ 
action diffusion system, and ~ 27.5 for the chemical reaction network, respectively. 
As an example of 0(1) algorithm vs. O(logM) algorithm, the statistics of computer 
operations of the HLM vs. the NRM for the generalized KMP model are demon¬ 
strated in Figure in which the number of moves of events are further broken down 
into moves with relinking pointers and moves without relinking pointers (i.e. within 
the same bucket). The number of operations of the NRM shown in the hgure is the 
number of swaps in the binary heap. We hnd that the HLM is more efficient than 
the NRM when M is greater than 10^. 

6. Conclusion 

In the present paper, we introduced a fast method for the stochastic simulation 
algorithm (SSA), namely the Hashing-Leaping Method (HLM), for a class of Markov 
jump processes arising in many scientific fields. The common feature of these Markov 
jump processes is that they are driven by many heterogeneous, state-dependent 
exponential clocks. The number of exponential clocks, or the scale of the system, is 
denoted by M. As the Markov jump process proceeds at a sequence of random times, 
the main strategy of the SSA is to identify the next time of occurrence among many 
exponential random times. To do so, the HLM uses a hash-table-like data structure 
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Figure 5. CPU time in seconds per 10® events vs. number of expo¬ 
nential clocks for the react ion-diffusion model. Blue, black, red, and 
green plots represent the mean CPU times of the CRM, the HLM, the 
NRM, and the DM over 10 runs, respectively. The error bars indicate 
one standard deviation of the mean. 

to distribute times of occurrence covered by a certain time step with length r into 
Q evenly divided buckets, updates all buckets sequentially, and leaps forward by r. 
Under assumptions (a) and (b) in Section 4.3, the average computational cost per 
event of the HLM is 0(1), independent of the number of exponential clocks. 

For large scale Markov jump processes, the HLM has the desired performance. 
The speed of the HLM is tested with three large-scale models: a generalized KMP 
model, a chemical reaction network, and a stochastic reaction-diffusion system. Our 
simulation results showed advantages of the HLM over the DM, the NRM, and the 
CRM when M is greater than ~ 10^. In addition, we verihed numerically that the 
average number of computer operations per event of the HLM is 0(1). 

It is well accepted that no SSA method is unconditionally superior to the rest. For 
small-scale problems, Gillespie’s direct method (DM) usually has the best efficiency. 
This is conhrmed by our simulation result of a chemical oscillator model called the 
“Oregonator”. For this small-scale model (in which M = 5), the performance of 
the HLM remains competitive. While possibly due to the overhead of maintaining 
many groups, as the only other existing conditional 0(1) per-event SSA method to 
the best of our knowledge, the CRM is slower than the rest SSA methods when the 
scale of the Markov process is sufficiently small. 

We do not claim that the HLM is a perfect algorithm. One drawback is that 
the performance of the HLM depends on the choice of parameters. According to 
our analysis in Section 4.4, the optimal parameters depend on some constants that 
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Figure 6. CPU time in seconds per 10® events vs. number of expo¬ 
nential clocks for the “Oregonator” model. Heights of bars represent 
the mean CPU times of the CRM, the HLM, the NRM, and the DM 
over 10 runs, respectively. The red error bars indicate one standard 
deviation of the mean. 

should be estimated empirically. Although the performance of the HLM is not 
sensitive with respect to change of parameters as long as r ~ 0(1) and Q ~ 0{M), 
to reach the full efficiency of the HLM, some empirical estimations or small scale 
CPU time tests are be needed at the current stage. To partially solve this problem, 
in future, we will develop algorithms that can adjust parameters as the simulation 
proceeds. 

Overall, our analysis and numerical simulations show that the HLM is a promising 
SSA method, especially for large scale Markov processes. The performance of the 
HLM can be potentially improved in several aspects, from more efficient implemen¬ 
tations to parallelizations. It is also useful to trim the HLM for specihc problems, 
such as multiple time scale systems and systems with varying number of exponential 
clocks. Those issues will be studied in our subsequent works. 
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Figure 7. Number of operations per event of the generalized KMP 
model. Red: number of swaps per event of the NRM. Black: number 
of moves and comparisons per event of the HLM. Purple: number 
of moves (without relinking pointers) per event of the HLM. Green: 
number of moves (with relinking pointers) per event of the HLM. Blue: 
number of comparisons per event of the HLM. Parameters are same 
as in Section 5.2 (i). 
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